A Ribonucleotide ↔ Phosphoramidate Reaction Network Optimized by Computer-Aided Design

A growing number of out-of-equilibrium systems have been created and investigated in chemical laboratories over the past decade. One way to achieve this is to create a reaction cycle, in which the forward reaction is driven by a chemical fuel and the backward reaction follows a different pathway. Such dissipative reaction networks are still relatively rare, however, and most non-enzymatic examples are based on the carbodiimide-driven generation of carboxylic acid anhydrides. In this work, we describe a dissipative reaction network that comprises the chemically fueled formation of phosphoramidates from natural ribonucleotides (e.g., GMP or AMP) and phosphoramidate hydrolysis as a mild backward reaction. Because the individual reactions are subject to a multitude of interconnected parameters, the software-assisted tool “Design of Experiments” (DoE) was a great asset for optimizing and understanding the network. One notable insight was the stark effect of the nucleophilic catalyst 1-ethylimidazole (EtIm) on the hydrolysis rate, which is reminiscent of the action of the histidine group in phosphoramidase enzymes (e.g., HINT1). We were also able to use the reaction cycle to generate transient self-assemblies, which were characterized by dynamic light scattering (DLS), confocal microscopy (CLSM), and cryogenic transmission electron microscopy (cryo-TEM). Because these compartments are based on prebiotically plausible building blocks, our findings may have relevance for origin-of-life scenarios.


General
All chemicals were purchased from Merck Sigma Aldrich, TCI, AcrosOrganics, FisherScientific, VWR or chemPUR and were used without further purifications. In the case of guanosine 5'-monophosphate, the disodium salt hydrate was purchased. The guanosine 5'-monophosphate content (without crystal water) was determined by quantitative NMR (with internal standard) to be 80%. Since EDC is hygroscopic and susceptible to hydrolysis, it was stored in the freezer and its purity was determined regularly by quantitative HPLC and found to be around 99%.
Transmission Electron Microscopy (TEM) measurements were performed on a ZEISS EM10 microscope with an acceleration voltage of 120 kV.
Cryo-TEM grids were analyzed on a JEM-2100F (JEOL) at 200 kV and a temperature of -150 °C using a Gatan cryo-holder.
Dynamic light scattering (DLS) was performed on a Nano-Zetasizer (Malvern Instruments) at 25 °C with a 173° backscatter angle at λ = 633 nm.
Confocal laser scanning microscopy (CLSM) images were recorded with a TCS SP8 confocal microscope using the Leica Application Suite X (LASX) with the LIGHTNING wizard.

Objectives and Responses
The hydrolysis of Bn-GMP was investigated, focusing on revealing general trends and optimizing reaction conditions. More specifically, the overall goal of the DOE was to achieve nearly complete hydrolysis (Bn-GMP content < 10-15%) within 24 hours.
Based on this objective the response of interest was chosen to be the relative amount of unhydrolyzed Bn-GMP. The amount was determined by N,N,N',N',N',7benzenedimethanammonium dibromide was used as internal standard. Here, a calibration was conducted where different concentration ratios of the internal standard and Bn-GMP were plotted against the respective intensity ratios. The calibration and the respective chromatograms are depicted in Figure S1 and Table S1.

Input factors
The input factors used in the designed experiment are shown in Table S2 and Figure S2.  From previous experiments it was known that the response will not behave in a linear manner based on factor changes. Therefore, an experimental design sufficient of filling a full quadratic description model was used as starting point. In concrete terms this means a CCD design was chosen, where in the first block 20 runs (containing a full factorial design with additional 4 center points) and in the second block 10 runs (containing the axial star points with two additional center points) were conducted as shown in Table S3. Data Analysis Since the system of interest represents a chemical reaction, the response is limited between boundaries of 0 and 100%. For this reason, the data were at first transformed via a logit function. Based on this, the description model was chosen by applying an AICc model reduction algorithm of all not aliased terms available. Table S4 shows the respective model and the analysis of variance. The regression model and its individual terms (factor effects and interactions) except A 2 and AD are statistically significant and exhibit large F and low p-values. This indicates that the variation explained by the model and its individual terms are large compared to the unexplained variance of the Residual. The model terms A 2 and AD were kept in the model for hierarchy reasons. The high F and low p value of the lack-of-fit test imply that the Lack of Fit is not significant. Further Model statistics are shown in Table S5. The predicted coefficient of determination (Predicted R²) is close to 1 which is caused by a low predicted residual sum of squares (PRESS). This means that the variation explained by the model in "new data" is high, indicating that the model is suited for making predictions. The same holds true for the adjusted coefficient of determination (Adjusted R²) which measures the variation around the mean explained by the regression taking a penalty for an increasing number of model parameters into account. The difference between Adjusted R² and Predicted R² is less than 0.2, and therefore in reasonable agreement. The adequate precision can be regarded as a signal-to-noise ratio, where the predicted values of the design points are compared to the average prediction error. As a rule of thumb, a ratio greater than four is desirable which means that the observed ratio of 48.28 indicates an adequate signal. The regression model with the coded factors is depicted in Table S6.  Figure S3A-E displays diagnostic graphs of the description model. Figure S3A represents the normal Plot of Residuals. Since all data points are following a straight line, it can be assumed that the residuals are normally distributed. Random Fluctuations within the data are not supposed to be systematic and are expected therefore to follow a normal distribution. Potential outliers (not caused by random fluctuations) would have been detected within this plot by strongly deviating from this line. Figure S3B shows the predicted and transformed response plotted against their residuals. The data points are randomly scattered, indicating no expanding variance. Figure S3C shows the calculated Residuals against the individual run number. The residuals are randomly scattered indicating that no time depending uncontrolled variables are apparent. Figure S3D depicts the Residuals within each block. Both blocks exhibit similar ranges of scattering with no specific pattern apparent. Figure S3E shows an intuitive way of looking at the performance of the description model in terms of a comparison of predicted and observed response values. A major part of the data variation is explained by the description model. The data scatters around the first bisector. This means that observed responses are in good agreement with the predicted ones. Figure S3F represents the Cook's distance and is an example of an influence graph. In this plot, the difference in the predictions of the respective data with and without the run under consideration is calculated, squared, and put in proportion to the estimated variance and the model constant. Thus, the Cook's distance directly shows how much a measured value or potential outlier influences the whole model. Since all runs were below the threshold and no data point stood out in the normal plot either, no further runs were considered as outliers.  Figure S4 shows the graphical optimization process of the designed experiment after 24 h and with 5 equiv. of EtIm. In the yellow area the predicted response is smaller than 15%. The adjacent brown area indicates where the 95% confidence interval is below 15%. On this basis, three Confirmation runs (C1, C2 and C3) were conducted (see Figure 2C in the manuscript). Optimizing the formation of Bn-GMP by DoE

Objectives and responses
The second DoE dealt with the formation of Bn-GMP and the optimization of the forward reaction. In concrete terms this means that the objective was to evaluate if this reaction is suitable to obtain more than at least 50% Bn-GMP.
Based on this objective, the response of interest was chosen to be the relative amount of Bn-GMP formed. RNA and DNA quantification is commonly performed at 260 nm (e.g. OD260 measurements). 2 It was assumed that at this wavelength the only significant absorbing moiety within our system is the purine unit of the nucleotides. To verify this assumption, the Bn-GMP amount determined by internal standard calibration of the previous DOE was compared with the relative area percentage of the Bn-GMP peak in the chromatogram. As depicted in Table S7, the relative amount of Bn-GMP formed in the previous DOE was nearly identical with the relative area percentage of the Bn-GMP in the chromatogram at 260 nm, corroborating the validity of our assumption. Based on this observation, it was assumed that the amount of Bn-GMP and other species present can be reasonably determined as relative area percentages in the chromatogram at 260 nm. To further prove the validity of this assumption and to exclude that other species (that were potentially not observed in the chromatographic analysis) are present, a test run was performed to determine the relative amount of Bn-GMP and GMP using quantitative 31 P NMR (with TMP as an internal standard).
The results were compared with the relative area percentages of the respective LC-MS analysis (see Table S8). Even when using a different analytical technique to determine the amount of each species, the relative area percentages in the chromatogram are well comparable to the ones determined by this method. The deviations are within the necessary accuracy for analyzing the reaction network. Furthermore, it becomes apparent that the species observed in the 31 P spectra are similar to those monitored in the respective chromatograms, indicating that no (potentially new) species were overlooked that could cause an error in the chromatographic analysis. Figure S5: 31 P NMR spectra of Bn-GMP.

Input factors
The input factors used in the designed experiment are shown in Table S9 and Figure S6. From previous experiments it was known that factor changes will not result in a linear change of the response. Therefore, it was assumed that a linear description model would be not sufficient to represent the considered system well. As starting point of the DOE an experimental design was used that is suitable to fill a full quadratic description model. In concrete terms this means a CCD design was chosen, where in the first two blocks 40 runs (containing a full factorial design with additional 8 center points) and the third block 13 runs (containing the axial star points and additional 3 center points) were conducted. Since a full quadratic model was not suited to describe the present system, the experimental design was augmented by optimal exchange methods (Point exchange using an Ioptimality criterion). As depicted in Table S10, this resulted in an experimental design of 87 runs suitable to fill a full cubic model.

Data Analysis
Since the forward reaction represents a chemical reaction, the response is limited between the boundaries of 0 and 100%. For this reason, the data was transformed via a logit function. Based on this, the description model was chosen by applying an AICc model reduction algorithm of all not aliased terms available. Table S11 shows the analysis of variance of the respective model. The regression model and most of its individual terms (factor effects and higher order interactions) are statistically significant and exhibit large F and low p-values. This indicates that the variation explained by the model as well as its individual terms are quite large compared to the unexplained variance of the Residual. Seemingly insignificant model terms like BD and D were kept within the model for hierarchy reasons. Conversely, the high F and low p-value of the lack-of-fit test imply that the Lack of Fit is not significant. Further Model statistics are shown in Table S12. The predicted coefficient of determination (Predicted R²) is close to 1 which is caused by a low PRESS. This means that the variation explained by the model in "new data" is high, indicating that the model is suited for making predictions. The same holds true for the adjusted coefficient of determination (Adjusted R²) which measures the variation around the mean explained by the regression taking a penalty for an increasing number of model parameters into account. The difference between Adjusted R² and Predicted R² is less than 0.2 meaning that they are in reasonable agreement. The adequate precision can be regarded as a signalto-noise ratio, where the predicted values of the design points are compared to the average prediction error. As a rule of thumb, a ratio greater than 4 is suited which means that the observed ratio of 578.089 indicates an adequate signal.

57.8089
The regression model with the coded factors is shown in Table S13.  Figure S7A-E displays the diagnostic graphs of the model. Figure S7A depicts the normal Plot of Residuals. Since all data points are following a straight line, it can be assumed that the residuals are normally distributed. Random Fluctuations within data are not supposed to be systematic and must therefore follow a normal distribution. A potential outlier (not caused by random fluctuations) would have been detected here by deviating from this line. Figure S7B shows the predicted transformed response plotted against their residuals. The data points are randomly scattered indicating no expanding variance. Figure S7C shows the calculated Residuals against the individual run number. The residuals are randomly scattered indicating that no time depending uncontrolled variables are apparent. Figure S7D depicts the Residuals within each block. Both blocks exhibit similar ranges of scattering with no specific pattern apparent. Figure S7E depicts an intuitive way of looking at the performance of the description model in terms of a comparison of predicted and observed response values. A great part of the variation of the data is explained by the description model. The data scatters slightly around the first bisector. This means that observed responses are in good agreement with the predicted ones. Figure S7F represents Cook's distance and is an example of an influence graph. In this plot, the difference in the predictions of the respective data with and without the run under consideration is calculated, squared, and put in proportion to the estimated variance and the model constant. Thus, the Cook's distance directly shows how much a measured value or potential outlier influences the whole model. Since all runs were below the threshold and no data point stood out in the normal plot either, no further runs were considered as outliers.  Figure S9 shows the graphical optimization process of the designed experiment after 60 min, at 55 °C, with 5 equiv. EtIm and with 40 mM GMP. In the yellow area the predicted response (amount of Bn-GMP) is larger than 50%. The adjacent brown area indicates where the 95% confidence interval is larger than 50%. On this basis, three Confirmation runs (C1, C2 and C3) were conducted (see Figure 3C in the manuscript).

Further experiments
Bn-GMP Reaction cycle at Room Temperature 16.3 mg (40 µmol, 40 mM) guanosine 5'-monophosphate (GMP) and 6.5 µL (60 µmol, 60 mM, 1.5 equiv.) benzylamine were dissolved in 1 mL of a freshly prepared aqueous reaction buffer (200 mM 3morpholinopropionic acid, 200 mM 2-morpholin-4-ylethanesulfonic acid and 200 mM 1-ethylimidazole (5 equiv.)). The pH was adjusted to pH = 6.1 at room temperature with conc. hydrochloric acid. 15.3 mg (80 µmol, 2 equiv.) 1-ethyl-3-(3-dimethylaminopropyl)-carbodiimide hydrochloride (EDC) were added at the beginning to start the reaction. The reaction mixture was stirred at 55 °C in a tightly closed vial. If the reaction is performed at 25 °C, the formation of Bn-GMP is still rather efficient. The hydrolysis, however, is slowed down significantly with a half-life of the Bn-GMP of around 80 h. This corroborates, that the transient self-assembly could in principle be performed without cycling between two temperatures, albeit with very slow deactivation. Continuous addition of the fuel EDC leads to an out-of-equilibrium steady state.

HPLC Analysis
The reaction cycle was run according to the procedure described above. For HPLC, 5 µL were dissolved in 995 µL MeCN/H2O (

Dynamic Light Scattering (DLS)
Samples for DLS were filtered through a 0.45 µm syringe filter before starting the first cycle according to the procedure described above. Dynamic light scattering was performed 1.5 h and 24 h after start of each cycle on a Nano-Zetasizer (Malvern Instruments) at 25 °C with a 173° backscatter angle at λ = 633 nm. At each time point, ten DLS runs were measured. The number mean of five representative measurements each are shown below. The hydrodynamic diameter was obtained as an average of these five measurements while the standard deviations was calculated from the same set of measurements.

Confocal Laser Scanning Microscopy (CLSM)
The reaction cycle was run according to the procedure described above. Before addition of EDC, 1.5 h and 24 h after starting the reaction cycle, samples for CLSM were prepared as follows: 0.5 µL of a 5 mM solution of C153 in THF were added to 1 mL of the sample (final concentration of C153 = 2.5 µM) at 25 °C. A drop of the prepared sample was placed on a freshly cleaned microscope glass slide with a 1 mm well and covered with a cover glass slip. CLSM images were recorded with a TCS SP8 confocal microscope using the Leica Application Suite X (LASX). The fluorescent dye was excited at 405 nm.

Transmission Electron Microscopy (TEM)
The reaction cycle was run according to the procedure described above. 5 µL of the reaction mixture before (control) and 1.5 h after (aggregated) addition of EDC were deposited on a copper grid at 25 °C. Negative staining with uranyl acetate was performed. TEM measurements were performed on a ZEISS EM10 microscope with an acceleration voltage of 120 kV.
Control experiments showed no aggregates inside the black areas of the stain. After 1.5 h, spherical objects were observed. Their size around 50 -100 nm differed from the size of the aggregates observed in solution (~ 3 µm), which indicates that they collapse on the surface when being dried.

Cryogenic Transmission Electron Microscopy (cryo-TEM)
The reaction cycle was run according to the procedure described above. 5 µL of the reaction mixture before (control) and 1.5 h after (aggregated) addition of EDC were adhered to a freshly glowdischarged holey carbon grid. After blotting, the grids were vitrified in liquid ethane by a Vitrobot FP 5350/60 (FEI, Eindhoven, Netherlands). The cryo-TEM grids were analyzed on a JEM-2100F (JEOL) at 200 kV and a temperature of -150 °C using a Gatan cryo-holder.
Control experiments showed no aggregates inside the round area of the grid. After 1.5 h, spherical objects were observed. Their size of around one µm differed from the size of the aggregates observed in solution (~ 3 µm), which indicates that they shrink upon freezing. The shape and dark outline is rather indicative for vesicles.  Figure S17: D2O,298 K) of N,N,N,N',N',N',D2O,298 K) of N,N,N,N',N',N',7benzenedimethanammonium dibromide.